Loading Geoscience Australia's Sentinel-1 IW Backscatter from the dev Open Data Cube (collection 0)¶
This notebook demonstrates key steps for using Python to load preliminary Sentinel-1 IW backscatter products developed by Geoscience Australia.
For Sentinel-1, Geoscience Australia's Digital Earth (DE) branch are currently offering a suite of experimental products that we are calling collection 0, with sample data available over parts of Australia and Antarctica. The product is a collaborative effort from Digital Earth Australia and Digital Earth Antarctica.
If you have any questions, please reach out to the Digital Earth Antarctica team at digitalearthantarctica@ga.gov.au.
Table of Contents¶
- Set-up
- Working with Datacube
- Searching and loading
- Transforming and visualising
- Visualising
- Speckle filtering
- Masking
- Converting to decibels
- Exporting
import matplotlib.pyplot as plt
import numpy as np
import datacube
from odc.geo import BoundingBox
from odc.geo.xr import write_cog
import pathlib
from pystac_client import Client
import xarray as xr
Working with Datacube¶
The Sandbox environment has direct access to Digital Earth's Open Data Cube, allowing users to connect via the datacube library.
The first step is to establish a connection to the datacube.
dc = datacube.Datacube(app='S1_Backscatter')
View available collections¶
GA's Sentinel-1 data is published according to the polarisation mode used to capture the data. At this time, we publish data captured in Interferometric Wide (IW) mode. There are three products that we have experimental data for as part of our collection 0:
- VV+VH:
ga_s1_iw_vv_vh_c0 - VV:
ga_s1_iw_vv_c0 - HH:
ga_s1_iw_hh_c0
You can see the distribution of captured data over time and space in the DEA Dev Explorer:
dc_products = dc.list_products()
dc_products.loc[["ga_s1_iw_vv_vh_c0", "ga_s1_iw_vv_c0", "ga_s1_iw_hh_c0"]]
| name | description | license | default_crs | default_resolution | |
|---|---|---|---|---|---|
| name | |||||
| ga_s1_iw_vv_vh_c0 | ga_s1_iw_vv_vh_c0 | Geoscience Australia Sentinel-1 Interferometri... | CC-BY-4.0 | None | None |
| ga_s1_iw_vv_c0 | ga_s1_iw_vv_c0 | Geoscience Australia Sentinel-1 Interferometri... | CC-BY-4.0 | None | None |
| ga_s1_iw_hh_c0 | ga_s1_iw_hh_c0 | Geoscience Australia Sentinel-1 Interferometri... | CC-BY-4.0 | None | None |
Troubleshooting¶
If you don't see any products, make sure you're working in the Dev Sandbox. The collection 0 products are only indexed in the dev datacube.
Searching and Loading¶
This section provides examples for both Australia and Antarctica, as there are some differences between the two. Differences in polarisation are due to Sentinel-1's observation scenarios. The table below captures the primary differences in the collection 0 offering:
| Property | Australia | Antarctica |
|---|---|---|
| Primary capture mode | IW Vertical Dual-Polarisation (VV+VH) | IW Horizontal Single-Polarisation (HH) |
| Primary product | ga_s1_iw_vv_vh_c0 |
ga_s1_iw_hh_c0 |
| Recommended CRS (metres) | UTM or EPSG:3577 | EPSG:3031 |
| Group by method | Solar day | Scene ID |
In Australia, there are a small number of IW Horizontal and Vertical Single-Polarisation acquisitions over south-east Queensland, but neither are the primary capture mode, and the collection 0 product only contains some data in 2024.
Australia¶
Here, we use a bounding box over Lake Sorell and Lake Crescent in Tasmania, and a date range of June 2020 through mid-July 2020.
aus_bbox = BoundingBox(
left=147.07251,
bottom=-42.22120,
right=147.24274,
top=-42.03035,
crs="EPSG:4326"
)
aus_start_date = "2020-06-01"
aus_end_date = "2020-07-15"
To load the data, we use dc.load. With the dc.load command, you can specify:
product: the product, or list of products to load from.lon: the longitude range to load the data for. If not specified, the product's maximum range will be used.lat: the latitude range to load the data for. If not specified, the product's maximum range will be used.time: the date range to load the data for. If not specified the product's maximum range will be used.measurements: the measurements to load from the data (e.g. VV). If not specified, all measurements will be loaded.group_by: how to group loaded data. For Australia, the valuesolar_daywill ensure all bursts captured on the same day are grouped together under one time-stamp (this groups bursts from multiple scenes if captured on the same day).output_crs: the coordinate reference system (CRS) to project loaded data to. If not specified, the data's native CRS will be used.resolution: the resolution to load the data at, in the same units as the chosen CRS. If not specified, the data's native resolution will be used.dask_chunks={}: request that the data be lazily loaded - an xarray showing the expected dimensions and measurements will be returned. Data will be computed when used. If not specified, data will be loaded into memory.
Note: When selecting a CRS for data over Australia, we recommend "EPSG:3577" to get data back in a coordinate system that uses metres. Alternatively, you can use the UTM zone CRS for your area of interest. In this case, this is "EPSG:32755", which we use for consistency with the STAC notebook.
# Lazy load our filtered data
aus_ds = dc.load(
product="ga_s1_iw_vv_vh_c0",
x=(aus_bbox.left, aus_bbox.right),
y=(aus_bbox.bottom, aus_bbox.top),
time=(aus_start_date, aus_end_date),
measurements=["VV", "VH", "mask"],
group_by="solar_day",
output_crs="EPSG:32755",
resolution=(20, -20),
dask_chunks={},
)
aus_ds
<xarray.Dataset> Size: 34MB
Dimensions: (time: 5, y: 1062, x: 706)
Coordinates:
* time (time) datetime64[ns] 40B 2020-06-02T09:09:19.445999 ... 202...
* y (y) float64 8kB 5.326e+06 5.326e+06 ... 5.347e+06 5.347e+06
* x (x) float64 6kB 5.201e+05 5.201e+05 ... 5.06e+05 5.06e+05
spatial_ref int32 4B 32755
Data variables:
VV (time, y, x) float32 15MB dask.array<chunksize=(1, 1062, 706), meta=np.ndarray>
VH (time, y, x) float32 15MB dask.array<chunksize=(1, 1062, 706), meta=np.ndarray>
mask (time, y, x) uint8 4MB dask.array<chunksize=(1, 1062, 706), meta=np.ndarray>
Attributes:
crs: EPSG:32755
grid_mapping: spatial_refAntarctica¶
Here, we use a bounding box over Canada Glacier in eastern Antarctica, with a date range of June 2018 through July 2018.
ant_bbox = BoundingBox(
left=162.8555,
bottom=-77.6376,
right=163.0801,
top=-77.5813,
crs="EPSG:4326"
)
ant_start_date = "2018-06-01"
ant_end_date = "2018-07-31"
To load the data, we use dc.load. With the dc.load command, you can specify:
product: the product, or list of products to load from.lon: the longitude range to load the data for. If not specified, the product's maximum range will be used.lat: the latitude range to load the data for. If not specified, the product's maximum range will be used.time: the date range to load the data for. If not specified the product's maximum range will be used.measurements: the measurements to load from the data (e.g. VV). If not specified, all measurements will be loaded.group_by: how to group loaded data. For Antarctica, it is preferred to group bursts by scene. The team is still developing this functionality. The next best option is to group bysolar_day, which will ensure all bursts captured on the same solar day are grouped together under one time-stamp (this is not ideal for Antarctica, but is sufficient for now).output_crs: the coordinate reference system (CRS) to project loaded data to. If not specified, the data's native CRS will be used.resolution: the resolution to load the data at, in the same units as the chosen CRS. If not specified, the data's native resolution will be used.dask_chunks={}: request that the data be lazily loaded - an xarray showing the expected dimensions and measurements will be returned. Data will be computed when used. If not specified, data will be loaded into memory.
Note: When selecting a CRS for data over Antarctica, we recommend "EPSG:3031", which matches the data's native projection.
# Lazy load our filtered data
ant_ds = dc.load(
product="ga_s1_iw_hh_c0",
x=(ant_bbox.left, ant_bbox.right),
y=(ant_bbox.bottom, ant_bbox.top),
time=(ant_start_date, ant_end_date),
measurements=["HH", "mask"],
group_by="solar_day",
output_crs="EPSG:3031",
resolution=(20, -20),
dask_chunks={},
)
ant_ds
<xarray.Dataset> Size: 3MB
Dimensions: (time: 5, y: 374, x: 344)
Coordinates:
* time (time) datetime64[ns] 40B 2018-06-03T12:47:55.044581 ... 201...
* y (y) float64 3kB -1.296e+06 -1.296e+06 ... -1.288e+06 -1.288e+06
* x (x) float64 3kB 3.992e+05 3.992e+05 ... 3.924e+05 3.924e+05
spatial_ref int32 4B 3031
Data variables:
HH (time, y, x) float32 3MB dask.array<chunksize=(1, 374, 344), meta=np.ndarray>
mask (time, y, x) uint8 643kB dask.array<chunksize=(1, 374, 344), meta=np.ndarray>
Attributes:
crs: EPSG:3031
grid_mapping: spatial_refLoading data in memory¶
Once you have decided you are happy with the area of interest, crs, resolution, bands, etc., you can load data into memory using xarray's .compute operation.
This is valuable once you are ready to apply transformations to the data, or wish to visualise the data, as it will save needing to read the data into memory every time.
aus_ds = aus_ds.compute()
ant_ds = ant_ds.compute()
Transforming and visualising loaded data¶
The GA Sentinel-1 backscatter products are provided as linear gamma0 values. It is common to apply some transformations, such as masking, speckle filtering, and converting from linear scale to decibels. For collection 0, converting to other normalisation conventions (e.g. sigma0 and beta0) is not currently available.
If you need either sigma0 or beta0 for your work, please contact digitalearthantarctica@ga.gov.au with your request and information about your application to provide context about why you need an alternative normalisation convention. This helps us understand demand for these products, which may influence if we deliver them in future.
Visualising¶
To see the full timeseries for a particular band, the following plotting command can be used:
Australia¶
The dark areas are the two lakes.
aus_ds.VV.plot.imshow(col="time", col_wrap=3, robust=True, cmap="Greys_r")
<xarray.plot.facetgrid.FacetGrid at 0x7f643d70e6e0>
Antarctica¶
The dark area corresponds to the glacier. The bright segments are likely due to layover as there is significant terrain in this region.
ant_ds.HH.plot.imshow(col="time", col_wrap=3, robust=True, cmap="Greys_r")
<xarray.plot.facetgrid.FacetGrid at 0x7f642744d4e0>
Speckle filtering¶
Speckle filtering aims to reduce noise present in SAR images. One filter that is commonly applied is the Lee filter. We have supplied a python file that contains the Lee filter definition, as well as a function to apply it to xarrays.
In the two examples below, we use a window value of 5 pixels. Due to the Antarctic example being a smaller area, it appears more blurred than the Australian example.
from speckle_filters import apply_lee_filter
We also define a plotting function that allows us to compare the filtered and unfiltered bands:
def compare_bands_plot(ds, time_index, band1, band2, cmap="Greys_r"):
ds_timestep = ds.isel(time=time_index)
fig, axes = plt.subplots(1, 2, figsize=(10, 5))
ds_timestep[band1].plot(ax=axes[0], cmap=cmap, robust=True)
ds_timestep[band2].plot(ax=axes[1], cmap=cmap, robust=True)
plt.tight_layout()
plt.show()
Australia¶
In this example, we apply the Lee filter using a uniform filter window size of 5 pixels.
aus_ds["VV_filtered"] = apply_lee_filter(aus_ds.VV, size=5)
compare_bands_plot(ds=aus_ds, time_index=0, band1="VV", band2="VV_filtered")
Antarctica¶
In this example, we apply the Lee filter using a uniform filter window size of 5 pixels.
ant_ds["HH_filtered"] = apply_lee_filter(ant_ds.HH, size=5)
compare_bands_plot(ds=ant_ds, time_index=0, band1="HH", band2="HH_filtered")
Masking¶
The GA Sentinel-1 backscatter product comes with a mask that indicates invalid pixels, along with pixels impacted by layover and shadow.
The masks have the following values:
| Value | Property |
|---|---|
| 0 | Valid |
| 1 | Shadow |
| 2 | Layover |
| 3 | Shadow and layover |
| 255 / NaN | Invalid |
The following code displays the masks and shows how to apply them.
Australia¶
Viewing the masks for each observation, there is very little layover or shadow in this area. This is consistent with there not being much terrain in the chosen area.
aus_ds.mask.plot.imshow(
col="time",
col_wrap=3,
levels=[-0.5, 0.5, 1.5, 2.5, 3.5],
cbar_kwargs={'ticks': [0, 1, 2, 3]}
)
<xarray.plot.facetgrid.FacetGrid at 0x7f643c5954e0>
Applying the mask¶
To apply the mask, we use xarray's where function, which takes the condition, followed by the values to keep if the condition is true, followed by the values to use if the condition is false.
The following code creates a new band, VV_filtered_masked, that keeps the original VV_filtered values where the mask is equal to 0, and replaces values with NaN otherwise.
For plotting, this time we use a non-grey colour bar to make the effect of the masking more obvious (although there is little masking in this scene).
aus_ds["VV_filtered_masked"] = xr.where(aus_ds.mask==0, aus_ds.VV_filtered, np.nan)
aus_ds.VV_filtered_masked.isel(time=0).plot.imshow(robust=True)
<matplotlib.image.AxesImage at 0x7f643cb23910>
Antarctica¶
There is significant terrain in the region that may introducing bright and dark artifacts from layover and shadow. Viewing the mask shows there are multiple areas flagged as affected by layover, and it is important to replace these values with no-data (NaN) before continuing with any analysis.
ant_ds.mask.plot.imshow(
col="time",
col_wrap=3,
levels=[-0.5, 0.5, 1.5, 2.5, 3.5],
cbar_kwargs={'ticks': [0, 1, 2, 3]}
)
<xarray.plot.facetgrid.FacetGrid at 0x7f6427a2ea40>
Applying the mask¶
To apply the mask, we use xarray's where function, which takes the condition, followed by the values to keep if the condition is true, followed by the values to use if the condition is false.
The following code creates a new band, HH_filtered_masked, that keeps the original HH_filtered values where the mask is equal to 0, and replaces values with NaN otherwise.
Each observation has its own mask, which will be applied when using this approach.
For plotting, this time we use a non-grey colour bar to make the effect of the masking more obvious.
ant_ds["HH_filtered_masked"] = xr.where(ant_ds.mask==0, ant_ds.HH_filtered, np.nan)
ant_ds.HH_filtered_masked.isel(time=0).plot.imshow(robust=True)
<matplotlib.image.AxesImage at 0x7f643e2bb280>
Converting to decibels¶
GA's Sentinel-1 backscatter product is provided as linear gamma0. For some analyses, it may be beneficial to convert the linear backscatter to decibels (dB).
The conversion equation is $$\text{backscatter}_{\text{dB}} = 10 \times \log_{10}(\text{backscatter}_\text{linear})$$
Australia¶
aus_ds["VV_filtered_masked_db"] = 10*np.log10(aus_ds.VV_filtered_masked)
aus_ds.VV_filtered_masked_db.plot.imshow(col="time", col_wrap=3, cmap="Greys_r")
<xarray.plot.facetgrid.FacetGrid at 0x7f643e2afb20>
Antarctica¶
ant_ds["HH_filtered_masked_db"] = 10*np.log10(ant_ds.HH_filtered_masked)
ant_ds.HH_filtered_masked_db.plot.imshow(col="time", col_wrap=3, cmap="Greys_r")
<xarray.plot.facetgrid.FacetGrid at 0x7f643c1b84f0>
Exporting¶
Once you have generated the data you need, you may wish to export it to enable further analysis.
The odc_geo python library provides a useful function for exporting cloud optimised GEOtiffs.
First, we'll create a folder to store the outputs in.
pathlib.Path("outputs").mkdir(exist_ok=True)
Single time step¶
The write_cog function is designed to export a single time step.
The following code shows how to isolate the first time step and export it.
# Australia
write_cog(
aus_ds.VV_filtered_masked_db.isel(time=0),
"outputs/STAC_Australia_VV_filtered_masked_db_t0.cog",
overwrite=True
)
# Antarctica
write_cog(
ant_ds.HH_filtered_masked_db.isel(time=0),
"outputs/STAC_Antarctica_HH_filtered_masked_db_t0.cog",
overwrite=True
)
PosixPath('outputs/STAC_Antarctica_HH_filtered_masked_db_t0.cog')
Multiple time steps¶
It is possible to wrap the above function in a loop to allow you to export all time steps in an xarray.
# Australia
for timestep in range(len(aus_ds.time)):
filename = f"outputs/STAC_Australia_VV_filtered_masked_db_t{timestep}.cog"
array = aus_ds.VV_filtered_masked_db.isel(time=timestep)
write_cog(array, filename, overwrite=True)
# Antarctica
for timestep in range(len(ant_ds.time)):
filename = f"outputs/STAC_Antarctica_HH_filtered_masked_db_t{timestep}.cog"
array = ant_ds.HH_filtered_masked_db.isel(time=timestep)
write_cog(array, filename, overwrite=True)